Shahad Almubarak

The data has 1599 observations of 13 variables about redwine.

Univariate Plots Section

## [1] 1599   12
##  [1] "fixed.acidity"        "volatile.acidity"     "citric.acid"         
##  [4] "residual.sugar"       "chlorides"            "free.sulfur.dioxide" 
##  [7] "total.sulfur.dioxide" "density"              "pH"                  
## [10] "sulphates"            "alcohol"              "quality"
## Observations: 1,599
## Variables: 12
## $ fixed.acidity        <dbl> 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, 7.…
## $ volatile.acidity     <dbl> 0.700, 0.880, 0.760, 0.280, 0.700, 0.660, 0…
## $ citric.acid          <dbl> 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06, 0…
## $ residual.sugar       <dbl> 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2.0…
## $ chlorides            <dbl> 0.076, 0.098, 0.092, 0.075, 0.076, 0.075, 0…
## $ free.sulfur.dioxide  <dbl> 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15, …
## $ total.sulfur.dioxide <dbl> 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, 65…
## $ density              <dbl> 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0.9…
## $ pH                   <dbl> 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30, 3…
## $ sulphates            <dbl> 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46, 0…
## $ alcohol              <dbl> 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, 9.…
## $ quality              <int> 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5, 5…
##  fixed.acidity   volatile.acidity  citric.acid    residual.sugar  
##  Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900  
##  1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900  
##  Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200  
##  Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539  
##  3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600  
##  Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500  
##    chlorides       free.sulfur.dioxide total.sulfur.dioxide
##  Min.   :0.01200   Min.   : 1.00       Min.   :  6.00      
##  1st Qu.:0.07000   1st Qu.: 7.00       1st Qu.: 22.00      
##  Median :0.07900   Median :14.00       Median : 38.00      
##  Mean   :0.08747   Mean   :15.87       Mean   : 46.47      
##  3rd Qu.:0.09000   3rd Qu.:21.00       3rd Qu.: 62.00      
##  Max.   :0.61100   Max.   :72.00       Max.   :289.00      
##     density             pH          sulphates         alcohol     
##  Min.   :0.9901   Min.   :2.740   Min.   :0.3300   Min.   : 8.40  
##  1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500   1st Qu.: 9.50  
##  Median :0.9968   Median :3.310   Median :0.6200   Median :10.20  
##  Mean   :0.9967   Mean   :3.311   Mean   :0.6581   Mean   :10.42  
##  3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300   3rd Qu.:11.10  
##  Max.   :1.0037   Max.   :4.010   Max.   :2.0000   Max.   :14.90  
##     quality     
##  Min.   :3.000  
##  1st Qu.:5.000  
##  Median :6.000  
##  Mean   :5.636  
##  3rd Qu.:6.000  
##  Max.   :8.000

Observations from the Summary

1.dataset consists of 12 row and 1599 colomns.

2.range for fixed acidity is quite high with minimum of 4.60 and maximum 15.90.

3.range for quality is btween 8 and 3 with mean at 5.6.

4.range for PH from 2.740 to 4.010 with a median of 3.310.

5.range for alcohol is btween 14 and 8 with mean at 10.42.

Quality

Here it appears that the spread for the quality for Red wine data set seems to normal distribution and most of the wines have a quality rating of 5 or 6.

Now I will examine the chemical variables, starting with acidity:

fixed.acidity

The distribution of fixed acidity is right skewed, and most of it between 6 and 7.

volatile.acidity

The distribution of volatile.acidity is right skewed,and most of volatile.acidity between 0.4 and 0.6.

citric.acid

The distribution of citric.acid is right skewed,and most of citric.acid between .25 and .50. the value at 0 is outlier

residual.sugar

The distribution of residual sugar is also right skewed. and most of it around 2.

chlorides

The distribution of chlorides right skewed. and most of it around 0.09.

free.sulfur.dioxide

The distribution of free.sulfur.dioxide right skewed. and most of it around 7.

total.sulfur.dioxide

The distribution of total.sulfur.dioxide right skewed. and most of it around 21.

density

The distribution of density is normal distribution . and most of it almost 1.

pH

The distribution of pH is normal distribution . and most of it at 3.

sulphates

The distribution of sulphates is right skewed . and most of it at 0.55.

alcohol

The distribution of alcohol is right skewed . and most of it btween between 9.4 and 9.6.

Now i will see the correlation between variables

Univariate Analysis

What is the structure of your dataset?

In this dataset, there are 1.599 wines with 13 variables, all variables are numerical , one of them is quality which is variables and 11 of them are chemical properties (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides,free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol).

What is/are the main feature(s) of interest in your dataset?

The main features of interest dose this chemical properties affect on red wine quality or not ?

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

In my opinion chemical properties( alcohol , citric acid ,Fixed acidity,Volatile acidity ) may affect the quality of the wine.

Did you create any new variables from existing variables in the dataset?

I haven’t create any new features so far.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

The dataset was already tidy and did not need to change its format.

Bivariate Plots Section

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$alcohol and redwine$quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4373540 0.5132081
## sample estimates:
##       cor 
## 0.4761663

There is a slight positive correlation between alcohol and quality. when the alcohol increases the quality increases too.

citric acid and quality relationship

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$citric.acid and redwine$quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1793415 0.2723711
## sample estimates:
##       cor 
## 0.2263725

There is low correlation between citric acid and quality.

Fixed acidity and quality relationship

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$fixed.acidity and redwine$quality
## t = 4.996, df = 1597, p-value = 6.496e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.07548957 0.17202667
## sample estimates:
##       cor 
## 0.1240516

There is low correlation between fixed acidity and quality.

Volatile acidity and quality relationship

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$volatile.acidity and redwine$quality
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4313210 -0.3482032
## sample estimates:
##        cor 
## -0.3905578

There is a negative correlation between volatile acidity and quality.

Alcohol and density Relationship

## 
##  Pearson's product-moment correlation
## 
## data:  redwine$alcohol and redwine$density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5322547 -0.4583061
## sample estimates:
##        cor 
## -0.4961798

As we can see at the graph there are a strong correlation between Alcohol and density

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Alcohol affect on redwine quality and density , since it has a strong correlation with them While alcohol increases the quality, it also decreases the density.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

Alcohol: Positive correlation with quality negative Correlation with density.

Quality: Positive correlation with alcohol negative correaltion with volatile acidity.

Fixed.acidity : Positive strong correlation with citric.acid and negative correalation with pH.

What was the strongest relationship you found?

The strongest relationship is between Alcohol and Quality.

Multivariate Plots Section

It appears on the graph that the density of the wine does not have much effect on the quality. While alcohol dose effect.

It appears that fixed acidity causes pH levels to decrease as long as increases the Quality.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

I observed that when the there are more alcohol and fixed acidity at wine , it increses its quality. while density alone dose not have much effect on wine quality

Were there any interesting or surprising interactions between features?

While the wine have less pH and more fixed acidity it cuses it to be a high quality wine.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.


Final Plots and Summary

Plot One

Description One

Quality is the main interest on my exploration , we can see that most of the wine has a normal quality.

Plot Two

Description Two

From the graph there is a strong correlation between alcohol and density , While as we see In previous graph the density doesn’t have much effect on wine quality even though alcohol dose , when alcohol increase it increases wine quality.

Plot Three

Description Three

This three chemical properties is about acidity in wine , but as its shown in the gragh , there are a low positive correlation between citric acid and fixed acidity with quality of wine which means that they may have a slight effect on wine quality , on the other hand , the volatile acidity have a negative correlation with wine quality which means it has a negative effect on wine quality.


Reflection

This Dataset consist of 1,599 observations with 11 chemical proprieties , my first move is to understand the dataset and what its contain , then the main interest was “ Dose the chemical proprieties of wine effect its quality or no “ ? The struggles that its my first time using R to explore and analysis , so i faced difficulty to use a proper code on each insight , although there were some small deatils that i should learn about R to get this project done , so i learned the basics from Udacity lectures and uses some of youtube to understand R. Also , the surprising thing that i don’t know have a great background about wine propriety , i thought it is all about alcohol .. after i have done this project i can say that i have enough information about red wine.

In the future work , i wish that there were different kind of wine , like white wine to compare with red wine and spicify which one have a great quality.